84 research outputs found

    Improving selection stability of multiple testing procedures for fMRI

    Get PDF
    In search of an appropriate thresholding technique in the analysis of functional MRI-data, several methods to prevent an inflation of false positives have been proposed. Two popular (voxelwise) methods are the Bonferroni procedure (BF), which controls the familywise error rate (FWER), and the Benjamini-Hochberg procedure (BH), which controls the false discovery rate (FDR) (Benjamini & Hochberg 1995). Multiple testing procedures are typically evaluated on their average performance with respect to error rates, ignoring the aspect of variability. Resampling techniques allow to assess the selection variability of individual features (voxels). Following the approach of Gordon, Chen, Glazko & Yakovlev (2009) in the context of gene selection, we investigated whether variability on test results for BF and BH can be reduced by including both the significance and selection variability of the voxels in the decision criterion

    Maximized likelihood ratio tests for functional localization in fMRI

    Get PDF
    fMRI localizer tasks are often used to define subject specific functional regions of interest (fROIs) that contain the relevant features for subsequent analyses. fROIs are typically small and show large interindividual differences in extent and effect size. As statistical testing procedures focus on con- trolling false positives, this may lead to an ad-hoc adjustment of thresholding in some individuals. The promising likelihood ratio (LR) testing approach for fMRI (Kang et al. 2015) provides simultaneous control of both false positives and negatives by contrasting evidence in favor of true activation against evidence in favor of the null hypothesis. The authors propose to estimate the expected alternative by a percentile (e.g. 95th) across the voxels of an effect size map. However, in the context of fROIs, pre-defined observed percentiles may induce inconsistent activation across subjects. In this study we show the potential of a maximized LR approach (Bickel, 2012) for this particular application. The maximum LR is calculated over the same interval of functionally relevant alternatives for all subjects, enabling consistent localization of the fROIs in both subjects with low levels and high levels of general activity

    Data-analytical stability in second-level fMRI inference

    Get PDF
    We investigate the impact of decisions in the second-level (i.e. over subjects) inferential process in functional Magnetic Resonance Imaging (fMRI) on 1) the balance between false positives and false negatives and on 2) the data-analytical stability (Qiu et al., 2006; Roels et al., 2015), both proxies for the reproducibility of results. Second-level analysis based on a mass univariate approach typically consists of 3 phases. First, one proceeds via a general linear model for a test image that consists of pooled information from different subjects (Beckmann et al., 2003). We evaluate models that take into account first-level (within-subjects) variability and models that do not take into account this variability. Second, one proceeds via permutation-based inference or via inference based on parametrical assumptions (Holmes et al., 1996). Third, we evaluate 3 commonly used procedures to address the multiple testing problem: family-wise error rate correction, false discovery rate correction and a two-step procedure with minimal cluster size (Lieberman and Cunningham, 2009; Bennett et al., 2009). Based on a simulation study and on real data we find that the two-step procedure with minimal cluster-size results in most stable results, followed by the family- wise error rate correction. The false discovery rate results in most variable results, both for permutation-based inference and parametrical inference. Modeling the subject-specific variability yields a better balance between false positives and false negatives when using parametric inference

    Evaluating of bootstrap procedures for fMRI data

    Get PDF
    Over the last decade the bootstrap procedure is gaining popularity in the statistical analysis of neuroimaging data. This powerful procedure can be used for example in the non-parametric analysis of neuro-imaging data. As fMRI data are complexly structured with both temporal and spatial dependencies, such bootstrap procedures may indeed offer an elegant solution. However, a thorough investigation on the most appropriate bootstrapping procedure for fMRI data has to our knowledge never been performed. Friman and Westin (2005) showed that a bootstrap procedure based on pre-whitening the temporal structure of fMRI time series is superior to procedures based on wavelets or Fourier decomposition of the signal, especially in the case of blocked fMRI designs. For time-series, several bootstrap schemes can be exploited though (see e.g. Lahiri, 2003). For the re-sampling of residuals from general linear models fitted on fMRI data, we examine more specifically here the differences between 1) bootstrapping pre-whitened residuals which are based on parametric assumptions of the temporal structure, 2) a blocked bootstrapping which avoids making such assumptions (with several variants like the circular bootstrap,. . . ), and 3) a combination of both bootstrap procedures. We furthermore explore whether the bootstrap procedures is best applied before or after smoothing the volume of interest. Based on real data and simulation studies, we discuss the temporal and spatial properties of the bootstrapped volumes for all possible combinations and nd interesting differences

    Evaluation of second-level inference in fMRI analysis

    Get PDF
    We investigate the impact of decisions in the second-level (i.e., over subjects) inferential process in functional magnetic resonance imaging on (1) the balance between false positives and false negatives and on (2) the data-analytical stability, both proxies for the reproducibility of results. Second-level analysis based on a mass univariate approach typically consists of 3 phases. First, one proceeds via a general linear model for a test image that consists of pooled information from different subjects. We evaluate models that take into account first-level (within-subjects) variability and models that do not take into account this variability. Second, one proceeds via inference based on parametrical assumptions or via permutation-based inference. Third, we evaluate 3 commonly used procedures to address the multiple testing problem: familywise error rate correction, False Discovery Rate (FDR) correction, and a two-step procedure with minimal cluster size. Based on a simulation study and real data we find that the two-step procedure with minimal cluster size results in most stable results, followed by the familywise error rate correction. The FDR results in most variable results, for both permutation-based inference and parametrical inference. Modeling the subject-specific variability yields a better balance between false positives and false negatives when using parametric inference

    Stability based testing for the analysis of fMRI data

    Get PDF
    Neurological imaging has become increasingly important in the field of psychological research. The leading technique is functional magnetic resonance imaging (fMRI), in which a correlate of the oxygen-level in the blood is measured (the BOLD-signal). In an fMRI-experiment, a time series of brain images is taken while participants perform a certain task. By comparing different conditions, the task-related areas in the brain can be localised. An fMRI study leads to enormous amounts of data. To analyse the data adequately, the brain images are devided into a large number of volume units (or voxels). Subsequently, a time series of the measured signal is modelled voxelwise as a linear combination of different signal components, after which an indication of activation can be tested in each voxel. This encompasses an enormous number of simultaneous statistical tests (+/-250 000 voxels). As a result, the multiple testing problem is a serious challenge for the analysis of fMRI data. In this context, classical multiple testing procedures such as Bonferroni and Benjamini-Hochberg (Benjamini & Hochberg, 1995) have been applied to respectively control the family-wise error rate (FWER) and the false discovery rate FDR)(Genovese, Lazar, & Nichols, 2002). Random Field Theory (Worsley, Evans, Marrett, & Neelin, 1992) controls the FWER while accounting for the spatial character of the data. Because of the dramatically decrease in power when controlling the FWER, methods to control the topological false discovery rate (FDR) were developed (Chumbley & Friston, 2009; Heller, Stanley, Yekutieli, Rubin, & Benjamini, 2006). A general shortcoming of current procedures is the focus on detecting non-null activation while a non-null effect is not necessarily biologically relevant. Moreover, failing to reject the hypothesis of no activation is not the same as conïŹdently excluding important effects. Another aspect that remains largely unexplored is the stability of test results which can be deïŹned as selection variability of individual voxels (Qiu, Xiao, Gordon, & Yakovlev, 2006). Given the need to control both false positives (type I errors) and false negatives (type II errors) in a direct manner (Lieberman & Cunningham, 2009), we approach the multiple testing problem from a different angle. Following the procedure of (Gordon, Chen, Glazko, & Yakovlev, 2009) in the context of gene selection, we present a statistical method to detect brain activation that not only includes information on false positives, but also on power and stability. The method uses bootstrap resampling to extract information on stability and uses this information to detect the most reliable voxels in relation to the experiment. The ïŹndings indicate that the method can improve stability of procedures and allows a direct trade-off between type I and type II errors. In this particular setting, it is shown how the proposed method enables researchers to adapt classical procedures while improving their stability. The method is evaluated and illustrated using simulation studies and a real data example

    Adaptive thresholding for fMRI data

    Get PDF
    In the analysis of functional MRI-data, several thresholding procedures are available to account for the huge number of volume units or features that are tested simultaneously. The main focus of these methods is to prevent an inflation of false positives. However, this comes with a serious decrease in power and therefore leads to a problematic imbalance between type I and type II errors (Lieberman & Cunningham, 2009). In this research, we present a method to estimate the number of activated features. The goal is twofold: ‱ Given the expected number of active units, widely used methods to control the false discovery rate (FDR) can be made adaptive and more powerful. ‱ The type I and type II error rate following such a thresholding technique can be estimated enabling a direct trade-off between sensitivity and specificity. Chen, Wang, Eberly, Caffo, & Schwartz (2009) argue that activation foci in fMRI data are often small and local leading to a large proportion of null voxels. However, task-related activation is expected to occur in clusters of voxels rather than in isolated single voxels. We consider peaks of activation instead of voxels and provide a procedure to estimate the number of active peaks. Concentrating on peaks leads to an enormous data reduction, and the proportion of non-null hypotheses can be expected to be much larger among peaks than among voxels. Given an estimate of the number of active and non-active peaks, we demonstrate how an adaptive FDR controlling procedure on peaks can be obtained and how false positive and negative rates associated with this procedure can be estimated. This allows researchers to reconsider the balance between sensitivity and specificity in function of study goals. The method is evaluated and illustrated using simulation studies and a real data example. References Chen, S., Wang, C., Eberly, L., Caffo, B., & Schwartz, B. (2009). Addaptive control of the false discovery rate in voxel-based morphometry. Human Brain Mapping , 30 , 2304-2311. Lieberman, M. D., & Cunningham, W. A. (2009). Type i and type ii error concerns in fmri research; re-balancing the scale. Social cognitive and affective neuroscience, 4 , 423-428

    Another perspective on voxel-wise multiple testing procedures in fMRI

    Get PDF
    Introduction: We review 3 widely used voxel-wise approaches to thresholding images of test statistics: Bonferroni (BF), Gaussian random field (GRF) and Benjamini-Hochberg (BH). While the latter controls the false discovery rate (FDR), the first two control the family-wise error rate (FWE). Comparisons of multiple testing procedures (MTP) in the neuroimaging literature have typically focused on sensitivity and specificity. However, stability (Gordon et al., 2007) is another important operating characteristic that needs to be taken into account. Here, we define stability as the variability due to the MTP in the detection of truly activated voxels. Methods: Following Marchini and Presanis (2004), we simulated 3D Gaussian random fields using a FWHM ranging from 20 mm to 50 mm and voxel dimensions of 4 X 4 X 6 mm. To these “null” SPMs, we added positive activation by simulating an extra GRF and transforming all voxels marginally to have a Gamma(k,1) distribution (k ranging from 3 to 7). The images had dimensions 40 X 40 X 10, and of the 16000 voxels, 400 were positively activated. Each simulation setting was repeated 1000 times. First, the performance of BF and GRF was compared at fixed theoretical levels of the FWE (ranging from 0.01 to 0.10). Next, to allow for a fair comparison of BF and GRF with BH, thresholds were determined for each procedure that result in an equal empirical FDR on average (for different levels of the FDR ranging from 0.01 to 0.10). Using these thresholds that equalized the FDR on average, we explored the number of true discoveries and its variability for each MTP. Results: When equalizing the theoretical FWE, GRF outperforms BF in terms of mean number of true discoveries, but tends to be more variable with decreasing effect size (for the range of smoothness values considered in this simulation setting). Figures 1 and 2 show the standard deviation of the true discoveries as a function of the mean for 10 levels of the FWE (0.01 to 0.10 in steps of 0.01) for large and small effects respectively (each symbol representing a different FWE). Overall, the coefficient of variation (standard deviation divided by mean) is smaller for GRF than for BF. When equalizing the empirical FDR, BF and GRF perform identically (as they are using exactly the same ordering of p-values). When the effect size is large (small), BH detects less (more) true activated voxels than BF=GRF, regardless of smoothness considered. In all scenarios, the variability in the number of detected voxels is larger with BH then BF=GRF. Figures 3 and 4 show the standard deviation of the true discoveries as a function of the mean for 10 levels of FDR (0.01 to 0.10 in steps of 0.01) for large and small effects respectively (each symbol representing a different FDR).Overall, the coefficient of variation is smaller for GRF=BF than BH

    Assessing publication bias in coordinate-based meta-analysis techniques?

    Get PDF
    Introduction While publications of fMRI studies have flourished, it is increasingly recognized that progress in understanding human brain function will require integration of data across studies using meta-analyses. In general, results that do not reach statistical significance are less likely to be published and included in a meta-analysis. Meta-analyses of fMRI studies are prone to this publication bias when studies are excluded because they fail to show activation in specific regions. Further, some studies only report a limited amount of peak voxels that survive a statistical threshold resulting in an enormous loss of data. Coordinate-based toolboxes have been specifically developed to combine the available information of such studies in a meta-analysis. Potential publication bias then stems from two sources: exclusion of studies and missing voxel information within studies. In this study, we focus on the assessment of the first source of bias in coordinate-based meta-analyses. A measure of publication bias indicates the degree to which the analysis might be distorted and helps to interpret results. We propose an adaptation of the Fail-Safe N (FSN; Rosenthal, 1979). The FSN reflects the number of null studies, i.e. studies without activation in a target region, that can be added to an existing meta-analysis without altering the result for the target region. A large FSN indicates robustness of the effect against publication bias. On the other hand, in this context, a FSN that is too large indicates that a small amount of studies might drive the entire analysis. Method We simulated 1000 simplistic meta-analyses, each consisting of 3 studies with real activation in a target area (quadrant 1 in Figure 1) and up to 100 null studies with activation in the remaining 3 quadrants. We calculated the FSN as the number of null studies (with a maximum of 100) that can be added to the original meta-analysis of 3 studies without altering the results for the target area. Meta-analyses were conducted with ALE (Eickhoff et al., 2009; 2012; Turkeltaub et al., 2012).  We computed the FSN using an uncorrected threshold (α = 0.001) and 2 versions of a False Discovery Rate (FDR) threshold (q = 0.05), FDR pID (which assumes independence or positive dependence between test statistics) and FDR pN (which makes no assumptions and is more conservative). We varied the average sample size n of the individual studies, from small (n≈10), to medium (n≈20) and large (n≈30). Results Results are summarised in Figure 2 and visually presented in Figure 3. We find a large difference in average FSN between the different thresholding methods. In case of uncorrected thresholding, the target region remains labeled as active while only 3% of the studies in the meta-analysis report activation at that location.  Further, if the sample size of the individual studies in the meta-analysis increases, the FSN decreases. Conclusions The FSN varies largely across thresholding methods and sample sizes. Uncorrected thresholding allows for the analysis to be driven by a small amount of studies and is therefore counter-indicated. While a decreasing FSN with increasing sample size might be counterintuitive in terms of robustness, it indicates that the analysis is less prone to be driven by a small number of studies. Publication bias assessment methods can be a valuable add-on to existing toolboxes for interpretation of meta-analytic results. In future work, we will extend our research to other methods for the assessment of publication bias, such as the Egger Test  (Egger et al., 1997) and test for excess of success (Francis, 2014). References Egger, M., Davey Smith, G., Schneider, M., and Minder, C. (1997), ‘Bias in meta-analysis detected by a simple, graphical test’, British Medical Journal, vol. 315, pp. 629-634. Eickhoff, S.B., Laird, A.R., Grefkes, C., Wang, L.E., Zilles, K., and Fox, P.T. (2009), ‘Coordinate-based activation likelihood estimation meta-analysis of neuroimaging data: A random-effects approach based on empirical estimates of spatial uncertainty’, Human Brain Mapping, vol. 30, pp. 2907-2926. Eickhoff, S.B., Bzdok, D., Laird, A.R., Kurth, F., and Fox, P.T. (2012), ‘Activation likelihood estimation revisited’, Neuroimage, vol. 59, pp. 2349-2361. Francis, G. (2014), ‘The frequency of excess success for articles in Psychological Science’, Psychonomic Bulletin and Review, vol. 21, no. 5, pp. 1180-1187. Rosenthal, R. (1979), ‘The file drawer problem and tolerance for null results’, Psychological Bulletin, vol. 86, no. 3, pp. 638–641. Turkeltaub, P.E., Eickhoff, S.B., Laird, A.R., Fox, M., Wiener, M., and Fox, P. (2012), ‘Minimizing within-experiment and within-group effects in activation likelihood estimation meta-analyses’, Human Brain Mapping, vol. 33, pp. 1-13
    • 

    corecore